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SPECIFICATION 



APPARATUS AND METHOD FOR DISCRIMINATING DOCUMENTS 
CROSS-REFERENCE TO RELATED APPLICATIONS 
This application is a continuation application and 

is based upon PCT/ JP99/05398 , filed on September 30, 

1999, 

Technical Field 

The present invention relates to a document 
discriminating apparatus, and a method for discriminating 
documents, which are suitable for use in processing 
documents at financial institutions and, more 
particularly, to an apparatus and method, for 
discriminating documents, for use in processing documents 
such as privately prepared slips in various formats. 
Background Art 

In recent years, image data readers such as optical 
character readers (OCRs) have been developed as devices 
for reading character information as image data for 
character recognition, and image data readers are now 
widely used in various industries to attempt to perform 
jobs efficiently. 

For example, operators at windows of financial 
institutions attempt to perform their jobs more 
efficiently by processing documents more efficiently 
using the image data readers • 

In particular, in order to increase efficiency in 
performing jobs such as processing the documents, it is 
required not only to process a large number of documents 
of the same type but also to automatically process a 
large amount of documents of various formats. 

To cope with this, document processing devices 
provided with an image data reader are used. The image 
data reader of the document processing device for reading 
image data on a document is adapted to read image data 
based on a control from an electronic computer. In 
addition, for example, image scanners and facsimile 
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machines are used as the image data reader. Furthermore, 
the image data reader can be an image data read and 
recognition device which can both read image data and 
recognize characters. 
5 In addition, the electronic computer functioning as 

a controller for controlling the image data reader is 
constituted by an input means which is a keyboard or a 
mouse for inputting instructions from the operator and 
data, a computer main body and a display for displaying 
10 data or control information • In addition, recognition of 
image data read by the image data reader is performed by 
the electronic computer main body. 
□ Furthermore, the document processing device is 

% provided with a hard disk, which is connected to the 

:|l 15 electronic computer and stores in advance position 

information of character data to be recognized for each 
type of document and information designating types and 
i3 numbers of characters (hereinafter, referred to as 

"definition information" ) . 
iJI 20 Next, an operation will be described which is to be 

:r; performed when the document processing device is used. 

I y 

In recognizing character data described on, for 
example, an "ELECTRIC RATE BILL" using the image data 
reader, firstly, the operator operates the keyboard and 
25 designates definition information B corresponding to the 
type of a document (in this case, the document is 
regarded as an electric rate bill (Document B) ) • 

Following this, at the electronic computer, the hard 
disk is accessed to draw the designated definition 
30 information B for the document, and the image data reader 
is notified of the information. 

As this occurs, at the image data reader, reading of 
image data and recognition of characters can be 
performed, based on the definition information B which is 
35 a control information notified from the computer. 

However, in a method for processing documents, as 
definition information is designated for each document to 
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be read through the designation by the operator, the 
operator has to bear an additional work load, and there 
may occur designation errors with an increase in the 
number of pieces of definition information* Moreover, in 
5 a case where the operator is required to process 

thousands of types of documents, the designation by the 
operator is practically impossible • 

To cope with this, there has been proposed a method 
for automatically reading documents without any 
10 designation by the operator, as described above ^ by 
describing in advance an specific ID number at a 
% predetermined position on each type of document for 

5 discriminating the document of the type from others. 

According to this method, in reading the image data 
\jl 15 of the document by the image data reader, the character 
recognition is made possible by firstly identifying the 
g ID number put at the predetermined position and then 

'1^ using definition information (in this case, B) 

i|j corresponding to the ID number. 

Q 20 However, in case the position on an optical reader 

portion of the image data reader where a document is set 
is changed in reading image data, for example, even in 
case document identical to the document whose definition 
information has already been stored in the hard disk,r 

25 since a coordinate from the reference point (a physical 

origin) for an image such as a character data area and a 
graphic area does not match the coordinate in the 
definition information, it is determined that they are 
not in the same layout. 

30 At the document processing device, there may be a 

case in which the character recognition process is not 
performed properly in case the character recognition of 
image data is implemented after the layout of the read 
image data has been made to match the layout of the 

35 definition information. Due to this, firstly, the 

reference points for the respective images are extracted, 
the coordinates of the respective images from the 
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reference points are compared to each other ^ and whether 
or not the layouts match each other is determined. 

Here^ there are the following two methods for 
extracting a reference point for image data. As 
5 preconditions, a document to be read is a pre-printed 

document, and the printed position of the document on the 
form is controlled with high accuracy. 

In a first method, in a case where image data of a 
document is read with an image data reader which can 
10 discriminate an end face of the document to be read from 
the background of the form to be read, for example, an 
upper left-hand side corner portion at the end face of 
the form is regarded as the reference point. 

In addition, in a second method, in a case where 
15 image data of a document is read with an image scanner or 
a facsimile machine, since the read background cannot be 
discriminated from the end face of the form, a reference 
mark is printed in advance on a document form to be read, 
and this reference mark is extracted from the image data 
20 to become the reference point. 

In this second method, since the reference point 
position is printed in advance as the reference mark, 
there is provided an advantage that even in case the 
position where a document is set on the reader is 
25 changed, the reference point can be extracted in a stable 
fashion. 

However, in the document processing device, even in 
a case where the reference point is extracted accurately 
using the aforesaid method, since the character 

30 describing direction of image data does not become a 

correct direction in case the reading direction of image 
data is not the right direction, no character recognition 
process can be implemented. 

For example, in case a money transfer request slip 

35 in which information is described horizontally is read 
from a wrong direction by the image data read and 
recognition device, the image data of the money transfer 
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request slip so read is displayed on the display in the 
wrong direction. 

Due to this, in recognizing characters by the 
document processing device, the operator determines 
5 whether or not the reading of the document is implemented 
from the right direction by looking at the display, and 
in case the reading direction of the document is 
determined to be wrong, the operator inputs from the 
keyboard a designation to rotate the image data of the 

10 document so read through 90 to 180 degrees, and a 

rotational correction process for the image data needs to 
be implemented so that the image data of the document can 
be displayed on the display in the right direction. 

Incidentally, in executing jobs of transferring 

15 monies to accounts at banks, conventionally the operator 
inputs through the keyboard information on account 
number, names and amount of money which are described on 
a document which is a money transfer request slip, and as 
this occurs, the operator performs the input operation 

2 0 while watching the document and the display in an 
alternate fashion . 

However, when the operator performs the input 
operation while watching the document and the display 
alternately, since the movement of the eyes occurs 

25 frequently, there exists a problem that the visibility is 
deteriorated and hence there may be caused an error in 
matching an item on the document with an item on the 
display. 

To cope with this, there has been proposed a 
30 document processing device for reading a document with an 
image data read and recognition device and displaying the 
results of character recognition of the image data so 
read, as well as the image data itself on the display. 

According to this document processing device, since 
35 information described on the document can be seen 

directly on the display, the movement of the eyes is 
reduced, whereby the occurrence of errors in matching 
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between the character recognition results and the image 
data can be reduced when confirming and modifying the 
character recognition results. 

On the other hand, in recent years, another method 
5 has become the main stream, in data processing jobs at 
financial institutions, in which a client-server system 
is used to do data processing collectively, at high 
speeds, at the server. 

For example, it is proposed that clients, set at 
10 respective branches (sales points), and a server, set at 
5^ a regional center of a financial institution, are 

iS connected via exclusive lines or public lines to form a 

•■■-f client server system, so that document processing is 

2 implemented collectively by the server to thereby attempt 

15 to increase the efficiency of the document processing 
jobs. 

□ In batch processing the documents by the server as 

\^ described above, since the amount of data collected at 

' the server becomes gigantic, it becomes impossible for 
P 2 0 the operator to do document processing at the server, and 

therefore, a system must be designed in which the server 
can automatically do document processing without the 
involvement of the operator. 

To this end, the application of a document 
25 discriminating process using documents on which ID 

numbers are described to this client server system makes 
it possible for the server to automatically discriminate 
the types of documents and to read documents. 

In addition, also at clients of this client server 
30 system, as has been described above, the results of 

character recognition of the image data read by the image 
data reader, as well as the image data itself can be 
displayed on the display, whereby the information 
described on the documents can be seen directly on the 
35 display, thus making it possible to reduce the occurrence 
of errors in matching the character recognition results 
and the image data when confirming and modifying the 
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character recognition results. 

However, in the aforesaid method for processing 
documents, only documents prepared exclusively for the 
document processing device can be read by the device, and 
5 general documents which have conventionally been in use 
cannot be used. Therefore, there exists a problem that 
special documents have to be prepared for use with such a 
document processing device. 

Additionally, with the first reference point 

10 extracting method described above, while the print 
position of the document on the form needs to be 
controlled with high accuracy, in the case of a document 
printed with, for example, a word processor, papers are 
set manually, and the print position tends to vary every 

15 time printing is carried out, this causing the problem 
that using the upper left-hand side corner at the end 
face of the form as the reference point is not suitable 
for use with the word processor. 

Furthermore, with the second reference point 

20 extracting method described above, the document to be 

read is the special document on which the reference mark 
is printed, this causing the problem that the reference 
point cannot be extracted with respect to general 
documents on which no reference mark is printed. 

25 In addition, even with a method in which a special 

point on the front side of the document is used as the 
reference point, this method being adopted in the 
recognition technology used on document readers, there is 
also caused a problem that the method cannot function 

30 effectively if the layout of a document to be read cannot 
be identified to some extent. 

Additionally, even in case a similar document as 
that which has been read is read again, there occurs a 
case in which an image which has been read does not match 

35 the image which was read before due to dust or a thin 

spot, and when this happens there may be a case where the 
same reference point cannot be extracted. 
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Furthermore, in order for the rotational correction 
processing to be automatically implemented at the server, 
the character recognition of the image data which is 
actually read is implemented, and whether the reading 
5 direction of the document is correct or not is determined 
by whether or not the character recognition is possible, 
this causing a problem that the increase in efficiency of 
the document processing job is remarkably reduced. 

In addition, in the aforesaid method in which the 
10 character recognition results are confirmed and modified 
by displaying on the display the character recognition 
1^ results, as well as the image data itself, since matching 

;S[ between the image data and the confirmation items is 

■M visually implemented, in case there are many items to be 

;^ 15 confirmed in the image data, there is also provided a 
y problem that it is not possible to eliminate recognition 

■^^ errors in case where there are many items in the image 

data to be confirmed, 
ill Furthermore, in a case where image data cannot be 

20 displayed on the display at one time, in order to refer 
O to data disposed before, the display screen has to be 

scrolled down and, as this occurs by depressing a 
predetermined key with the finger the scroll operation, 
this results in a more complicated operation. 
25 Disclosure of the Invention 

The present invention was made in view of these 
problems, and an object thereof is to provide a document 
discriminating apparatus for processing documents which 
are general documents that have been conventionally used 
30 and which have various types of formats like privately 

prepared documents, and a document discriminating method. 

In addition, another object of the present invention 
is to implement at all times stably and automatically the 
extraction of a reference point on image data of a 
35 printed document even for a document printed on a normal 

type of paper with a word processor without using the 
conventional end face of the form and a reference mark. 
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A document discriminating apparatus according to the 
present invention, which can attain the objects, 
comprises an image reading means for reading image data 
from a document prepared in an optional format, an image 
5 data cutting out means for cutting out data corresponding 
to a designated special portion of the document from the 
image data read by the image reading means, a color 
constituent extracting means for analyzing the color 
constituents of the image data cut out by the cutting out 
10 means and setting a color separation parameter for the 
specific color constituent, and a color constituent 
□ separating means for producing data information for the 

J specific portion from the image data cut out based on the 

y3 color separation parameter from the color constituent 

15 extracting means, 
ijl Here, the color constituents are analyzed by the 

:!^^, three primary colors of the color, and one of the three 

jjf primary colors of the color is selected as the specific 

=1? color constituent. The color separation parameter 

20 related to the color constituent of the specific color is 
fU determined based on the concentration distributions of 

the three primary colors of the color. 

Then, the document is discriminated by cutting out 
image data corresponding to a plurality of designated 
25 specific locations from the read image data, preparing 
data information for the specific portion from the cut 
out image data based on the color separation parameters, 
and comparing the data information with the data 
information stored in the document discriminating 
30 dictionary unit. 

Consequently, according to the document 
discriminating apparatus according to the present 
invention, since accurate data information can be 
prepared for the cut out image data when automatically 
35 discriminating the type of the document from which image 
data is read with the image data read and recognition 
device, even if there exist a plurality of documents to 
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be read with the image reader in a mixed fashion, the 
operator can process the document without paying 
attention to the definition of each document to thereby 
attempt to increase the efficiency of the job. In 
5 addition, there is no need to describe an ID number for 

discriminating the document itself, whereby general 
documents can be used, thus making it possible to deal 
with the existing systems with ease. 
Brief Description of the Drawings 
10 Fig. 1 is a typical diagram showing the construction 

of a document discriminating apparatus. 

Fig. 2 is a block diagram showing a document 
discriminating apparatus which constitutes the base of 
the present invention, 
15 Fig. 3 is a diagram for explaining the operation of 

the document discriminating apparatus which constitutes 
the base of the present invention when a registration 
step is activated. 

Fig. 4 is a control block diagram of the document 
2 0 discriminating apparatus which constitutes the base of 
the present invention when attention is paid to the 
activation of the registration step. 

Fig, 5 is a flowchart for explaining the operation 
of the document discriminating apparatus which 
25 constitutes the base of the present invention when the 
registration step is performed. 

Fig. 6 is a control block diagram of the document 
discriminating apparatus according to the present 
invention when attention is paid to the activation of a 
30 document determination step. 

Fig. 7 is a flowchart for explaining the operation 
of the document discriminating apparatus which 
constitutes the base of the present invention when 
attention is paid to the activation of the document 
35 determination step, 

Fig, 8 is a diagram showing an example in which 
color constituent extraction and separation parameters 
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Fig. 9 is a control block diagram of an embodiment 
of the present invention when attention is paid to the 
activation of the registration step of the document 
discriminating apparatus. 

Fig. 10 is a flowchart for explaining the operation 
of the embodiment of the present invention when attention 
is paid to the activation of the registration step of the 
document discriminating apparatus. 

Fig. 11 is a control diagram of another embodiment 
of the present invention when attention is paid to the 
activation of a document discriminating step of a 
document discriminating apparatus, and 

Fig. 12 is a flowchart for explaining the operation 
of the embodiment of the present invention when attention 
is paid to the activation of the document discriminating 
step of the document discriminating apparatus. 
Mode for Carrying out the Invention 

To clarify the effectiveness provided by the present 
invention, firstly, the construction of a document 
discriminating apparatus which constitutes the base of 
the present invention will be described ♦ 

Fig. 1 is a functional block diagram showing the 
overall construction of a document discriminating 
apparatus. The document discriminating apparatus 
comprises an image data reader 101 for reading image data 
on a document, an electronic computer 101 for controlling 
the reading operation of image data, a hard disk 103 
connected to the electronic computer 102 for storing in 
advance position information of character data to be 
recognized for each type of document and information 
designating the type and number of characters 
(hereinafter, referred to as "definition information"), a 
display 104 for displaying data and control information 
and an input means 105 such as a keyboard or a mouse for 
inputting instructions from an operator and data. Then, 
the image data reader 101 is adapted to read image data 
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from a document 106 such as an "ELECTRIC RATE BILL." 
Note that the image data reader 101 includes for example 
an image scanner and a facsimile machine as an image data 
reading portion. 

In the document discriminating apparatus constructed 
as described above, in order to deal with the aforesaid 
problems, there is proposed a document discriminating 
apparatus for processing documents which are general 
documents that have been conventionally used and which 
have various types of formats like privately prepared 
documents . 

In addition, there is proposed a document 
discriminating apparatus adapted to implement at all 
times stably and automatically the extraction of a 
reference point on image data of a printed document even 
for a document printed on a normal type of paper with a 
word processor without using the conventional end face of 
the form and reference mark. 

Then, referring to the appended drawings, the 
proposed document discriminating apparatus will be 
described below. 

Fig. 2 is a block diagram showing the document 
discriminating apparatus, and similarly, the document 
discriminating apparatus shown in Fig. 2 also comprises 
an image data reader 101, an electronic computer 102 
(this electronic computer 102 comprising, as will be 
described, an input unit 105, a display 104 and a control 
unit 201) and a hard disk 103. 

Here, the image data reader 101 is designed to read 
image data of a document and, as with the image data 
reader shown in Fig. 1, an optical character reader (an 
OCR) and an image scanner can be used as the image data 
reader. 

In addition, the control unit 201 is designed to 
process the image data of a document read with the image 
data reader 101 as the document data based on the image 
data so read, and can be constituted by a CPU and a 
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memory of the electronic computer 102. 

Additionally, connected to this control unit 201 are 
the input unit 105 such as a keyboard or a mouse for 
inputting data or instructions from an operator thereinto 
and the display 104 for displaying thereon the image data 
read with the image data reader 101. 

Furthermore, the hard disk (a file memory) 103 is 
designed to store the image data of all documents that 
are read with the image data reader 101. 

Incidentally, as shown in Fig. 2, the control unit 
201 comprises as functional blocks an image data storing 
memory 202, an image data cutting out unit 203, a 
document discriminating dictionary unit 204, a data 
comparison unit 205, a threshold setting unit 206, a 
document determination unit 207, a definition storing 
unit 208, a definition storing table 211, a character 
recognition unit 209 and a character recognition result 
storing unit 210. 

The image data storing memory 202 is designed to 
temporarily store the image data of a document read with 
the image data reader 101. The image data cutting out 
unit 203 is designed to have a function as a document 
discriminating information extraction means for 
extracting required document discriminating information 
described on a document from the image data of the 
document stored in the image data storing memory 202 when 
receiving a designation of information (discriminating 
information) to be extracted through the operation of the 
input unit 105 by the operator. 

Here, in the image data cutting out unit 203, in 
extracting required document discriminating information 
from the discriminating information, the image data of 
the document read with the image data reader 101 is 
designed to be displayed on the display 104, and the 
operator can designate discriminating information based 
on image graphics displayed on the display 104. 

Note that the operator can designate every described 
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information such as character information, marks, seals 
or rules described on a document as information (unique 
information) to be extracted at the image data cutting 
out unit 203. Then, coordinate position information for 
the designated information, size information of the 
described information and data information are designed 
to be automatically extracted as document discriminating 
information through, for example, software processing or 
firmware processing at the image data cutting out unit 
203. 

Furthermore, the document discriminating dictionary 
unit 204 is designed to register document discriminating 
information extracted at the image data cutting out unit 
203 as document discriminating information for a specific 
document . 

To be specific, as shown in Fig. 3, document 
discriminating information for a document type A having 
an ID number '0101' affixed thereto is designed to be 
stored in an area 2 04a, whereas document discriminating 
information for a document type B having an ID number 
'0102' affixed thereto is designed to be stored in an 
area 204b. Thus, like information is stored sequentially 
in accordance with ID numbers. 

Here, an operation will be described below in which 
document discriminating information is registered as 
document discriminating information for a specific 
document. In Fig. 3, a state is shown in which the 
registration of the document type A has been completed, 
and a document type B of an "ELECTRIC RATE BILL" is about 
to be registered next. A document 106 which is the 
document type B read with the image data reader 101 is 
displayed on the display 104. The operator designates a 
plurality of locations which can be features to 
discriminate the document 106 (unique information) via 
the input means 105. 

In the figure, a state is shown in which the 
'electric rate* described on the document 106 is 
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designated as unique information Dl and 'FUJI ICHIRO' as 
unique information D2, Position information (Xq, Yq), 
size information and data information such as characters 
are read with respect to the 'ELECTRIC RATE' and the 
5 pieces of information are stored in a document 

discriminating information 1 column of the area 204b 
which is a storing place for the document type B in the 
document discriminating dictionary unit 204 as a single 
piece of unique information. Then, as with the unique 

10 information D2, 'FUJI ICHIRO' the information is stored 
in a document discriminating information 2 column of the 
area 204b. Thus, the plurality of pieces of unique 
information on the feature portions required to specify a 
single document type are stored, 

15 Consequently, an operation as a registration step is 

designed to be carried out by the image data storing 
memory 202, the image data cutting out unit 2 03 and the 
document discriminating dictionary unit 204 in which the 
document discriminating information described on the 

20 specific document is extracted from the image data of the 

specific document read with the image data reader 101 and 
the document discriminating information is registered in 
the document discriminating dictionary unit 204, 

Note while the image data of the document read with 

25 the image data reader 101 is temporarily stored in the 
image data storing memory 202 when the document 
discriminating information is registered in the document 
discriminating dictionary unit 204, image data on the 
whole document is designed to be stored in the hard disk 

30 103. 

In addition, the data comparison unit 205 has a 
function as a verification means for verifying whether or 
not document discriminating information registered in the 
document discriminating dictionary unit 204 exists in the 
35 image data of a specific document read out from the image 
data storing memory 202. Furthermore, the data 
comparison unit 205 has also a function as a reference 
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means for referring whether or not document 
discriminating information registered in the document 
discriminating dictionary unit 204 exists in the image 
data of an optional document read out with the image data 
5 reader 101 and stored in the image data storing memory 
202. 

Furthermore, the document determination unit 207 has 
a function as a determination means for determining 
whether or not a specific document has entirely been 

10 specified as a whole by determining whether or not the 

specific document can be recognized based on the results 
of the verification at the data comparison unit 205. 
Then, the document determination unit 207 also has a 
function as a document discriminating means for 

15 discriminating whether or not an optional document is the 
specific document based on the result of referring at the 
data comparison unit 205 functioning as the reference 
means . 

Here, to be specific, the data comparison unit 205 

20 calculates the degree of matching for image data inputted 
from the image data reader 101 by collating the 
information extracted at the image data cutting out unit 
203 and corresponding document discriminating information 
from the document discriminating dictionary unit 204. 

2 5 Then, the document determination unit 207 functioning as 
the document discriminating means is designed to 
determine whether or not a document for the image data 
inputted from the image data reader 101 can be 
discriminated by comparing the degree of matching of the 

30 document discriminating information from the data 

comparison unit 205 with a threshold from the threshold 
setting unit 206. 

Consequently, an operation as a verification step 
for determining whether or not the discrimination of a 

35 specific document can be implemented is performed by the 
document discriminating dictionary unit 2 04, the data 
comparison unit 205, the threshold setting unit 206 and 
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the document determination unit 207. 

In addition, in determination by the document 
determination unit 207 of the degree of matching based on 
the threshold information from the threshold setting unit 
5 206, the threshold information from the threshold setting 
unit 206 is designed to be set such that a determination 
can be made by absorbing an error that would occur during 
the reading operation by the image data reader 101 and a 
printing error found on the document itself. 

10 In addition, the definition storing unit 208 is 

designed to read out definition information for 
recognizing data described on the document from the 
definition storing table 209 to temporarily store the 
information therein. The storing of the information is 

15 implemented when a document read with the image data 
reader 101 can be recognized as the specific document 
that has already been registered at the document 
discriminating dictionary unit 204 by the document 
determination unit 207 when the system is operated, 

20 The definition storing table 209 is designed to hold 

the definition information (for example, read position 
information, character attribute information, the number 
of characters or numbers to be read) for character 
recognizing the contents described on a specific document 

25 corresponding to the document discriminating information 
registered at the document discriminating dictionary unit 
204. 

In addition, the character recognition unit 210 is 
designed to character recognize the image data in 

30 accordance with the definition information when the 

definition information corresponding to the image data 
from the definition storing unit 208 is inputted 
thereinto with respect to the image data stored in the 
image data storing memory 202 as being related to the 

35 document which has been able to be recognized as the 
specific document that has been registered at the 
document discriminating dictionary unit 204. 
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Furthermore, the character recognition result 
storing unit 211 is designed to store character 
information recognized at the character recognition unit 
210, 

5 The operation of the document discriminating 

apparatus constructed as has been described heretofore 
will be described separately as in a registration step 
and in a document discriminating step. 
[Processing in Registration Step] 

10 First, the operation of the document discriminating 

apparatus in the registration step will be described 
below with reference to a control block diagram shown in 
Fig. 4 in which attention is paid to the activation of 
the registration step, a flowchart shown in Fig. 5 

15 explaining the operation performed at the time of 
activation of the registration step and Fig. 3. 

Namely, as shown in Fig. 3, when image data of a 
document 10 6 (for example, an electric rate bill) is read 
at the image data reader 101 through the operation of the 

20 operator (step S501), the image data so read is 

temporarily stored at the image data storing memory 202 
(step 8502), and this image data is also stored in the 
hard disk 103, whereby the whole image data read at the 
image data reader 101 is stored (step S503). 

25 Note that as shown in Fig, 3, the image data read at 

the image data reader 101 is displayed via the display 
104 (step S504) . 

Here, in case the image data stored in the image 
data storing memory 202 and the hard disk 103 is first 

30 image data on a document that has ever been read, the 
document discriminating information is stored in the 
document registration dictionary unit 204 as will be 
described below. 

Namely, a plurality of pieces of unit information to 

35 be extracted are designated to the image data cutting out 
unit 203 when the operator operates the input unit 105 
while referring to the display 104 (step S505). 
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At the image data cutting out unit 2 03, when the 
unique information described on the document is 
designated, position information, size information and 
data information related to the unique information are 
5 extracted automatically from the image data of the 

document stored in the image data storing memory 202 
(step S506) and are then registered in the document 
discriminating dictionary unit 204 as the document 
discriminating information (step S507). 
10 While the case shown in Fig. 3 has been described 

before, for example, by operating the input unit 105 the 
operator designates an electric rate showing the details 
;^ of an amount of money to be paid which is described on 

the bill as first unique information Dl and 'FUJI ICHIRO' 
Ijf 15 indicating the designation of a payer as second unique 

Ql information D2. Then, the position information, size 

U1 information and data information in the first information 

'ri are stored whereas the position information, size 

information and data information in the second 
!S; 20 information are extracted at the image data cutting out 

O unit 2 03. Furthermore, in case the operator designates a 

plurality of unique information, the information 
corresponding to the unique information is sequentially 
extracted at the image data cutting out unit. 
25 As a result, the extracted document discriminating 

information for the document type B is stored in the area 
204b of the document discriminating dictionary unit 204 
as the document discriminating information of the 
document type B having the ID number '0102' affixed 
30 thereto. 

Note that in this document discriminating apparatus, 
the image data cut out at the image data cutting out unit 
2 03 is used only for discriminating the document. 
In addition, in this document discriminating 
35 apparatus, it is ensured that the document can be 

discriminated in the verification step and the document 
discrimination step by registering the plurality of 
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pieces of document discriminating information per single 
document without performing a normalization process of 
image data. 

[Operation in Document Discrimination Step] 
5 Next, the operation of the document discriminating 

apparatus according to the embodiment of the present 
invention in the document discriminating step will be 
described below with reference to a control block diagram 
shown in Fig. 6 in which attention is paid to the 
10 activation of the document discriminating step, and a 
flowchart shown in Fig. 7 explaining the operation 
performed at the time of activation of the document 
p discriminating step. 

O In the verification step, whether or not the 

:n 15 document can be specified using the document 

3h discriminating information registered at the document 

discriminating dictionary unit 204 can be verified with 
respect to the images of all the documents stored in the 
hard disk 103. When the verification is completed, when 
2 0 an actual document discrimination is implemented, an 

operation described below as a discrimination step for 
m specifying the type of the document will be performed for 

the image data of an optional document. 

Namely, when the image data of a certain document is 
25 read through the operation by the operator of the image 
data reader 101 (step S701), the image data so read is 
then temporarily stored in the image data storing memory 
202 (step S702) . 

Next, at the image data cutting out unit 203, a 
30 plurality of pieces of image data (discriminating 

information) are extracted and cut out based on the 
position information and size information which 
constitute the document discriminating information in the 
document type sequentially selected from the document 
35 discriminating dictionary unit 204 for the image data 

temporarily stored in the image data storing memory 202 
(steps S703, S704) . 
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Following this^ at the data comparison unit 205, a 
comparison determination is implemented by calculating 
the degree of matching between the data information of 
all the image data cut out at the image data cutting out 
5 unit 203 and the data information constituting the 
document discriminating information (step S705). 

Furthermore, at the document determination unit 207, 
whether or not the image data read with the image data 
reader 101 can specify the type of the document with the 

10 document discriminating information from the document 
discriminating dictionary unit 204 by comparing the 
degree of matching calculated as the comparison 
determination result from the data comparison unit 205 
with the determination standard of the degree of matching 

15 set at the threshold setting unit 206 (step S706). 

In addition, in making a determination at the 
document determination unit 2 07, in case any of the 
plurality of pieces of image data cut out at the image 
data cutting out unit 203 does not match the document 

20 discriminating information from the document 

discriminating dictionary unit 204, it is determined as a 
different type of document* 

To be specific, in step S706, for example, in a case 
where the image data of the first document type A is read 

25 from the image data reader 101, in case the document 

discriminating information sequentially read out from the 
document discriminating dictionary unit 204 to be 
compared with is related to the first document type A, it 
is determined as matched with respect to the document 

30 discriminating information (Y), and the read image data 
is specified as the matching document type. As this 
occurs, the specified document type is recorded in a 
memory, not shown, within the control unit 201 (step 
S707) • 

35 In addition, while the image data of the first 

document type A is read in from the image data reader 
101, for example, in case the document discriminating 



information sequentially read out from the document 
discriminating dictionary unit 204 as an object for 
comparison is related to the second document type B, it 
is determined as unmatched (N). As this occurs, as the 
5 document type cannot be specified, there is no need to 
record the result, then the flow proceeds directly to 
step S708 by bypassing step S707, Then, in case the 
degree of matching has not been determined with respect 
to all the types of documents in the document 
10 discriminating dictionary unit 204 (N), then the flow 
returns to step S703, where the degree of matching is 
u determined again using the document discriminating 

3 information related to other types of documents from the 

5 document discriminating dictionary unit 204. 

13 15 Thereafter, as in the case with the aforesaid 

'ill examples, the degree of matching of the document 

ill discriminating information is determined with respect to 

%. the image data read with the image data reader 101 based 

on the document discriminating information for each type 
f 2 0 of document stored in the document discriminating 

dictionary unit 204 (step S708)- 

Here, in case the image data read with the image 
data reader 101 can be specified as the single type of 
document in step S709 (Y), the specified type of document 
25 is then notified to the operator, and the type so 

specified is outputted to the definition storing unit 208 
(step S710) • 

On the contrary, in case the image data cannot be 
specified as the single type of document (N), the 
30 operator is notified that the document cannot be 

specified by displaying, for example, a message to that 
^ effect on the display 104 (step S711), 

In addition, at the definition storing unit 208, 
when the document type which is specified as the single 
35 type is inputted, the definition information (read 

position information, character attribute information, 
the number of characters and/or numbers to be read) 
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corresponding to the specified type of document is read 
out from the definition storing table 211* 

As a result, at the character recognition unit 209, 
this definition information and the character information 
5 described on the document whose image data is read with 
the image data reader 101 which is stored in the image 
data storing memory 202 are recognized and the character 
information as the result of recognition is stored in the 
character recognition result storing unit 210. 
10 Consequently, in performing the discriminating step, 

the image data of an optional document is read with the 
3 image data reader 101, and whether or not the optional 

3 document is the specific document is discriminated by 

t referring whether or not the document discriminating 

Jl 15 information registered in the document discriminating 

dictionary unit 204 exists in the image data of the 
optional document, whereby character recognition can be 
^ implemented . 

^1 Thus, the aforesaid document discriminating 

M 20 apparatus is constructed as comprising the image data 

reader 101, the image data storing memory 2 02, the hard 
disk 103, the image data cutting out unit 203, the 
document discriminating dictionary unit 2 04, the data 
comparison unit 2 05 and the document determination unit 
25 207. According to the construction, in discriminating 

documents, the type of document whose image data is read 
with the image data reader 101 can automatically be 
discriminated. Then, even if there are a plurality of 
types of documents to be read with the image data reader 
30 101 in a mixed fashion, the operator can process the 

documents without paying attention to the definition of 
each document, thereby making it possible to increase the 
efficiency of the document processing job. Furthermore, 
since no ID number for discriminating a document itself 
35 needs to be described, general types of documents can be 
used, and hence the document discriminating apparatus of 
the invention can be applied to an existing system 



without any difficulty. 

In addition, when registering the document 
discriminating information to the document discriminating 
dictionary unit 204, since required document 
5 discriminating information can automatically be taken in 
through designation by the operator who can do it while 
looking at the image data of the documents to be 
registered which are displayed on the display 104, the 
preparation of the document discriminating dictionary can 

10 be facilitated, thus making it possible to attempt to 

increase the efficiency of the document processing job. 

Additionally, the document discriminating 
information extracted when the operator designates the 
discriminating information can be designated at the 

15 plurality of locations and the document can be specified 
more accurately by designating the plurality of locations 
of the document than by designating a single location. 

Thus, according to the proposed document 
discriminating apparatus, the discrimination can also be 

20 performed on documents of various types of formats such 
as privately prepared slips. 

However, many of the private documents that have 
been in use in recent years are colored. A logo mark, 
which is one of features suitable for discriminating 

25 documents, is colored in many cases. In addition, 

documents of the same format are sometimes differently 
colored for different applications. 

Of course, with the aforesaid document 
discriminating apparatus, document discriminating 

30 information can be obtained which designates the colored 
portion as a feature for discriminating the document. 
However, with the above document discriminating 
apparatus, in registering the plurality of pieces of 
document discriminating information on each document, 

35 even if the colored portion is designated, the 

discriminating information obtained from the designated 
colored portion is processed as a monochromatic 
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information because the computer processes information in 
a binary fashion, and the monochromatic features of the 
document are then compared with each other. Due to this, 
the color information is substituted by monochromatic 
5 information, and therefore, the effectiveness of the 
colored features is deteriorated* 

In addition, in a case where the background of a 
document is colored, the contrast is lowered and the 
extraction accuracy of the discriminating information is 

10 also lowered. Moreover, in the case of a color printed 

document, tones tend to be changed easily due to printing 
and/or reading errors, and blending also occurs. In 
addition, the colored document is also subjected to thin 
spots and blurs. Due to this, comparison of 

15 monochromatic images lowers the discriminating accuracy, 
and furthermore, even if digitized color images are 
simply compared with each other, the difference becomes 
larger and no accurate discrimination can be implemented. 
Due to this, even if the color information has a 

20 feature in discriminating documents, the colored portion 
cannot be used as effective discriminating information. 
If the color information can be used in discriminating 
documents, it is clear that the discriminating capability 
can be improved extremely. 

25 To cope with this, according to an embodiment of the 

present invention, there is provided a document 
discriminating apparatus which can discriminate documents 
stably and accurately even if there occurs a change in 
color through a construction in which data information 

30 for data discriminating information for use in 

discriminating documents can be extracted based on color 
information. 

Next, referring to Fig. 8, a principle of extracting 
discriminating information based on color information 
35 according to the embodiment of the present invention will 

be described below. 

In general, it is well known that a color comprises 
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the three primary colors when it is separated into 
constituents. Also in the present invention, the 
principle of the three primary colors is adopted. 
However, for the sake of a simple description, in Fig. 8, 
5 only two color constituents are shown as a matter of 
conveniences . 

In Fig. 8, the axis of abscissa denotes, for 
example, a red constituent R and indicates that the 
density of the color increases as it goes farther in a 

10 direction indicated by an arrow R, whereas the axis of 

ordinate denotes a green constituent G and indicates that 
the density of the color increases as it goes upper in a 
direction indicated by an arrow G. In Fig. 8, while no 
blue constituent is shown, the blue constituent can be 

15 indicated in a direction normal to the surface of the 
paper on which Fig* 8 is illustrated. Therefore, the 
center 0 denotes white and indicates that as the density 
of the respective constituents increases, the color 
becomes closer to black. However, in the figure, the 

20 respective axes are shown on different scales. 

As has been described with respect to Fig. 3, in 
registering a document, the document 106 is read with the 
image data reader 101, the image of the document is 
displayed on the display 104, feature portions of the 

25 document are designated with the input unit 105, and the 
document discriminating information is extracted. As 
this occurs, the results of reading with the image data 
reader 101 may be fully colored on the screen or only the 
result of reading of the extracted portion may be 

30 colored. 

In Fig. 8, a state is shown in which, for example, 
the area of the 'ELECTRIC RATE' on the document type B is 
designated as an object for extraction, and the image 
data of the designated extracted portion is read. When 

35 analyzing the respective color constituents from the 

image data, they are represented as a plurality of dots 
in the figure. Since the object for extraction is a 



document, when analyzing the color constituents thereof, 
the result is a distribution of discontinuity. The 
figure illustrates a state in which the object for 
extraction is multi-colored, 
5 In the density distribution for each constituent 

shown in Fig. 8, when looking at the green constituent G 
and the red constituent R, a group g of dots indicated by 
a broken line has a green constituent G which is 
remarkably denser than those in the other groups r^, rg 
10 and r3, whereby the group g can be discriminated from the 
other groups. Namely, the color constituents of the 

if' group g are closer to the color image of the extracted 

jSi portion. 

|2 Then, in the present invention, the fact that the 

i; 15 group g can be discriminated from the other groups r^, rj 

yj and is used. For example, intermediate points between 

'"^^ the group g and the other groups r^, rg and r3 are 

rj obtained, respectively, and a solid line a passing 

W through these intermediate points is drawn. Here, 

\7 20 assuming that this solid line a is a boundary, the whole 

□ area is divided into a green constituent G side which is 

an area A and a non-green constituent side which is an 
area B. This boundary is made to be a color separation 
parameter. When the green constituent G is analyzed from 
25 the extracted image data by setting the color separation 
parameter for the green constituent G, in case the green 
constituent exists in the area A, it is determined that 
the data exist, and it is adopted for discriminating 
information as data information equivalent to the 
30 monochromatic data. On the contrary to this, in case the 
green constituent exists in the area B, it is not adopted 
as discriminating information. Therefore, in order to 
use the color information for discrimination, the 
separated color constituents are designated, and the 
35 designated color constituents are then stored in the 
discriminating information dictionary area 2 04b as 
conditions for specifying the area A, or the boundary a 
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is stored as the separation parameter in the 
discriminating information dictionary area 204b. 

To this end, in comparing color images with each 
other, they cannot be discriminated from each other in 
5 case the colors coincide with each other completely. 
However, even if there occurs a printing or reading 
error, by designating the specific color constituent and 
setting the boundary a, in case the specific color 
constituent exists in the area A separated by the 
10 boundary a, since it is regarded as the discriminating 
information which is made to be monochromatic data, the 
feature of the extracted portion can sufficiently be 
O specified, and the influence of variation in tone, thin 

if spots or blur can be eliminated. Moreover, there exists 

5 15 no relationship with the background color on the data. 

3l While the discrimination based on the color 

i^j information has been described, the aforesaid extraction 

principle can be applied even to a monochromatic object 
Ijf for extraction. Namely, according to the principle of 

m 20 the three primary colors of color, white represents a 

state in which the three primary constituents of color do 
m riot exist at all, and on the contrary, black represents a 

State in which the three primary constituents exist to 
their maximum level. Then, in a case where the extracted 
25 portion is monochromatic, since it is clear that the 
results of color constituent extraction show that the 
density distributions of the three primary constituents 
are in the vicinity of their maximum values, 
respectively, in case any of the three primary 
30 constituents is selected and designated, data can be 
extracted as discriminating information. 

Thus, even if the object for extraction contains 
color and monochromatic information in a mixed fashion, 
as is described above, it is ensured that discriminating 
35 information including monochromatic information can be 
obtained by setting the color separation parameter 
related to the designated color constituent. 
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Thus, while the extraction principle of color 
information has been described heretofore, the 
designation of color constituent and setting of the 
separation parameter are implemented while the results of 
5 the extraction of the extracted portion are indicated on 
the diagram shown in Fig. 8 which is indicated on the 
screen of the display 104. Then^ extraction of color 
information does not have to be implemented on a new 
document, and as long as the information is good enough 
10 to be read with the image data reader 101, color 

information can be extracted on a used document and can 
be registered in the discriminating information. 
Next, referring to Figs. 9 to 12, a document 
!5 discriminating apparatus will be described below which 

li;! 15 adopts the aforesaid color extraction principle according 

ill to the embodiment of the present invention. 

While the basic construction of a control unit of 
the document discriminating apparatus according to the 
5 embodiment of the present invention is similar to the 

2 0 block construction shown in Fig. 2, the basic 

construction according to the embodiment differs from 
that shown in Fig. 2 in that the control unit further has 
a color constituent extracting unit and a color 
constituent separating unit for specifying the aforesaid 
25 color extraction principle and that the color separation 
parameter and the data information are additionally 
stored in the respective document discriminating 
information of the document discriminating dictionary 
unit. 

30 While an input unit 105' is provided for color 

constituent extraction, the input unit 105 may be used 

for that purpose. 

Then, the operation of the document discriminating 

apparatus according to the embodiment will be described 
35 below for a registration step and a document 

discriminating step individually. 

[Process in Registration Step] 
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Similarly to the control block diagram shown in Fig. 
4, Fig. 9 shows the construction of a control block 
diagram when attention is paid in particular to the 
activation of the registration step in the document 
5 discriminating apparatus according to the present 
invention. 

The control block shown in Fig. 9 in which attention 
is paid in particular to the activation of the 
registration step is constituted by an image storing 
10 memory 202, an image data cutting out unit 2 03, and a 

document discriminating dictionary unit 204. Then, the 
image storing memory 202 is designed to store data from 
•=5 the image reader 101 which is obtained when a document is 

- read with the image reader 101. The control block of 

15 this embodiment is similar to the construction of the 
UJ control block shown in Fig. 4 in that a hard disk 103 is 

^'^^ connected thereto in which the data read from all 

p documents is stored. However, the document 

^j; discriminating apparatus according to the present 

20 invention is characterized further in that the color 
G extracting unit and the color constituent separating unit 

are additionally provided. 

The operation of the document discriminating 
apparatus in the registration step will be described 
25 below with reference to Fig. 10 showing a flowchart which 
explains operations at the time of activation of the 
registration step and Fig. 3. 

Namely, as shown in Fig. 3, when image data of a 
document 106 (for example, an electric rate bill) is read 
30 at the image data reader 101 through the operation by the 
operator (step SlOOl), the image data so read is 
temporarily stored in the image data storing memory 202, 
and this image data is also stored in the hard disk 103, 
whereby the whole image data read at the image data 
35 reader 101 is stored (step S1002). 

Note that the image data read at the image data 
reader 101 is indicated via the display 104, as shown in 
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Fig. 3 ( step sl003 ) . 

Here, in a case where the image data stored in the 
image data storing memory 202 and the hard disk 103 is 
the first image data that has ever been read related to a 
5 document, the document discriminating information is 

stored in the document discriminating dictionary unit 204 
as will be described below. 

Namely, while referring to the display 104, the 
operator operates the input unit 105 for designation of 

10 areas of unique information to be extracted relative to 
the image cutting out unit 203 (step S1004). For 
example, as shown in Fig, 3, the * ELECTRIC RATE', which 
is the first document discriminating information, is 
designated by surrounding it with a frame, 

15 Then, since the position (Xq, Yq) and size of the 

document discriminating information can be specified when 
the document discriminating information is designated, 
the image cutting out unit 203 automatically cuts out 
image data for extraction which corresponds to the unique 

20 information described on the document from the image data 
storing memory 202 (step S1005). 

Next, the cut out image data for extraction is sent 
to the color constituent extracting unit 213, where color 
information is analyzed from the cut out image data, and 

25 a density distribution related to each color constituent 
shown in Fig, 8 is prepared following the aforesaid color 
constituent extraction principle, and the distributed 
state is then displayed on the display 104 (step S1006), 
The operator can designate display conditions through the 

30 input unit 105 ' . 

Then, the operator looks at the density 
distributions of the respective color constituents shown 
on the display 104 and selects a characteristic color 
constituent from the respective color constituents. The 

35 operator then designates the color constituent and a 

boundary a related to the color constituent relative to 
the color constituent extracting unit 213 by operating 



ft.* 
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the input unit 105'. The color constituent extracting 
unit 213 receives the designation and determines a color 
separation parameter (step S1007). 

The determined color separation parameter is then 
5 sent to the color constituent separating unit 214. Here, 
the color constituent separating unit 214 extracts data 
information from the cut out image data following the 
color separation parameter (step S1008). 

Next, the position information, size information, 

10 color separation parameter and the data information 
related to the discriminating information which are 
prepared as described above are registered in the 
document discriminating information column of the 
corresponding document in the document discriminating 

15 dictionary unit 204 (step S1009). 

In addition, when designating a plurality of 
extraction objects, the operations from step S1004 to 
step S1009 are repeated for each extraction object to 
obtain discriminating information. The discriminating 

20 information so obtained for the plurality of extraction 
objects is registered for each document discriminating 
information of the corresponding document in the document 
discriminating dictionary unit 204. 

Referring to the aforesaid Fig. 3, for example, the 

25 electric rate indicating the details of the amount of 
money to be paid which is described on the bill is 
designated as first unique information by the operation 
of the input unit 105 by the operator, and 'FUJI ICHIRO' 
indicating the designation of a payer of the bill is 

30 designated as second unique information. Then, at the 
image data cutting out unit 203, the position 
information, size information, color separation parameter 
and data information in the first unique information are 
stored, and the position information, size information, 

35 color separation parameter and data information in the 

second unique information are extracted and stored in an 
area 204a of the document discriminating dictionary unit 
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204 . 

Thus, respective information related to a plurality 
of pieces of unique information is also extracted for a 
document and document discriminating information 
5 obtained is then stored in an area 204b of the document 
discriminating dictionary unit 204 as document 
discriminating information of the document type B having 
an ID number '0102' affixed thereto. 

Note that in this document discriminating apparatus 
10 the image data cut out at the image data cutting out unit 
203 is used only for discrimination of the document • 
O [Operation in Document Discriminating Step] 

'% Similar to the control block diagram shown in Fig. 

6, Fig. 11 shows a control block diagram when attention 
'^■l 15 is paid to the activation of the document discriminating 

ijll step. 

The control block shown in Fig. 11 comprises an 
image storing memory 202, an image data cutting out unit 
ill 203, a document discriminating dictionary unit 2 04, a 

20 data comparison unit 205, a threshold setting unit 206 

ill and a document determination unit 207, and the control 

block of Fig. 11 is similar to that of Fig. 6 in that the 
image storing memory 202 is designed to store data from 
the image reader 101 which is obtained when a document is 

25 read with the image reader 101. However, the control 

block shown in Fig. 11 is characterized in that a color 
constituent separating unit 214 is added to the document 
discriminating apparatus according to this embodiment. 
While the color constituent extracting unit 213 is 

30 included in the construction of the control block shown 
in Fig. 9 which is used when the registration step is 
activated, the color constituent extracting unit 213 is 
not included in the construction of the control block 
shown in Fig. 11 since the discriminating information 

35 does not have to be registered in the latter control 
block . 

The operation of the document discriminating 
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apparatus according to the embodiment of the present 
invention when the document discriminating step is 
activated will be described below using a flowchart for 
explaining the operation thereof when the document 
5 discriminating step is activated. 

As has been described previously, in the 
verification step, whether or not the images of all the 
documents stored in the hard disk 103 can be specified 
using the document discriminating information registered 

10 in the document discriminating dictionary unit 204 is 

verified, and when the verification is completed, at the 
time of actual document discrimination, an operation 
which will be described below as a discriminating step 
for specifying the type of a document is performed. 

15 Firstly, the image data of a certain document is 

read when the operator operates the image data reader 
101, The image data so read is then temporarily stored 
in the image data storing memory 202 (step S1201). 
Next, the image data cutting out unit 2 03 

20 sequentially selects and reads out document 

discriminating information for each document sequentially 
in the order of the types of documents stored from the 
document discriminating dictionary unit 204 (step S1202). 
Following this, the relevant data is cut out from 

25 the image data temporarily stored in the image data 

storing memory 202, based on the position information and 
size information which constitute the document 
discriminating information to be extracted in the first 
document type (step S1203). 

30 Then, the color constituent separating unit 214 

reads out a color separation parameter from the document 
discriminating information of the first document type in 
the document discriminating dictionary unit 204, and this 
color separation parameter is applied to the data cut out 

35 in step S1203. Here, the color constituent designated 
for this cut out image data is separated, and data 
information is extracted following the aforesaid 
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extraction principle (step S1204)* 

The data information prepared in step S1204 is sent 
to the data comparison unit 205, where the extracted data 
information is compared with the data information read 
5 out from the document discriminating information of the 
first document type, and the degree of matching with the 
data information is calculated (step S1205). 

Here, the calculation of the degree of matching for 
one piece of document discriminating information in the 

10 first document type is completed. Next, the degree of 
matching is sequentially calculated for a plurality of 
pieces of document discriminating information, and such a 
calculation is performed for all the document 
discriminating information. Then, the degree of matching 

15 of all the discriminating information is compared with 

the degree of matching set at the threshold setting unit 
206, and whether or not the degree of matching of all the 
discriminating information meets the judgment criteria is 
determined (step S1206)* 

20 In case the degree of matching of all the 

discriminating information meets the judgment criteria 
(Y), it means that the document read with the image 
reader 101 is specified as the first document type that 
has ever been read, and as this occurs, the specified 

25 document type is then recorded in a memory, not shown, 
within the control unit 201 as the result of the 
specification. On the contrary, the degree of matching 
of all the discriminating information does not meet the 
judgment criteria (N), since it means that no 

30 specification is implemented, no record is made in step 
S1207. 

Thus, while the discriminating operation from the 
first document type stored in the document discriminating 
dictionary unit 2 04 is completed, such discriminating 
35 operations from all the document types stored in the 
document discriminating dictionary unit 204 are to be 
executed (step S1208). 
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Here, in case the discriminating operation has not 
yet been completed with respect to all the document types 
stored in the document discriminating dictionary unit 2 04 
(N), the operations from steps S1203 to S1208 are 
5 repeated until the discriminating operation has been 

completed with respect to all the document types stored 
in the document discriminating dictionary unit 204. 

In case the discriminating operation has been 
completed with respect to all the documents (Y), whether 

10 or not the document type recorded in step S1207 exists is 
determined (step S1209). In case there are recorded one 
or more document types (Y), if there is one, the document 
type thereof, and if there are recorded a plurality of 
document types, then the document type which is closest 

15 is informed the operator of or displayed on the display 
104 (step S1210) . 

On the contrary, in case there is recorded no 
document type in step S1209, in other words, in case 
there is no specific document type in the step (N), that 

20 the document read with the image reader 101 cannot 
specify the document type is informed or displayed. 

Thus, when the discrimination of the document read 
with the image data reader 101 is completed, the 
following document needed to be discriminated is read 

25 with the image data reader 101, and the discriminating 
operations are repeated. 
Effectiveness of the Invention 

As has been described heretofore, the document 
discriminating apparatus according to the present 

30 invention comprises the image data reader 101, the image 

data storing memory 202, the hard disk 103, the image 
data cutting out unit 203, the document discriminating 
dictionary unit 204, the data comparison unit 205 and the 
document determination unit 207, and furthermore, the 

35 color constituent extracting unit and the color 

constituent separating unit are additionally provided. 
Then, the color constituents of the data cut out as the 
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document discriminating information from the image data 
read from the document are separated and the color 
constituent indicating the characteristics is designated 
from the density distributions of the respective color 
5 constituents, so that the color separation parameter can 
be set. 

The color information can be replaced with 
monochromatic information which can easily compare and 
discriminate color information by the fact that the color 

10 separation parameter can be set. Thus, thin spots in 

color at the area to be extracted and variation in tone 
between documents at the time of printing and reading can 
be properly dealt with, and furthermore, the influence of 
the background color can be eliminated. Owing to this, 

15 the accuracy of the document discriminating information 
can further be improved. 

In discriminating documents, the type of a document 
whose image data is read with the image data reader 101 
can automatically be discriminated, and the accuracy of 

2 0 the data cut out from the image data can be improved by 
the color separation parameter based on the color 
constituent extraction. Even if there exist a plurality 
of types of documents to be read at the image reader 101, 
the operator can process the documents without paying 

25 attention to the definition of each document, whereby the 
efficiency of the processing job can be improved. In 
addition, there is no need to describe an ID number for 
discriminating the document itself, whereby general types 
of documents can be used, thus the document 

30 discriminating apparatus of the present invention being 
able to be applied to existing systems with ease. 

Additionally, in registering the document 
discriminating information into the document 
discriminating dictionary unit 204, as the required 

35 document discriminating information can automatically be 
taken in when the operator so designates while looking at 
the image data of the document to be registered which is 
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shown on the display 104, the dictionary for 
discriminating documents can easily be prepared, thereby 
making it possible to attempt to increase the efficiency 
of the document processing job* 

Furthermore, as the document discriminating 
information that is extracted through the designation of 
the discriminating information by the operator can be 
designated at the plurality of locations of the document, 
the document can be specified with higher accuracy when 
compared with the document discriminating information 
which is designated at the single location. 
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What is claimed is: 

1. An apparatus for discriminating a document with 
a data information, said apparatus comprising: 

image reading means for reading image data 
5 from a document prepared in an optional format; 

image data cutting out means for cutting 
out data corresponding to a designated specified portion 
of said document from said image data read by said image 
reading means; 

10 color constituent extracting means for 

analyzing color constituents of said image data cut out 
by said cutting out means and setting a color separation 
parameter for a specific color constituent; and 

color constituent separating means for 

15 producing said data information for said specific portion 
from said image data cut out based on said color 
separation parameter from said color constituent 
extracting means. 

2. An apparatus for discriminating a document as 
20 set forth in Claim 1, wherein said color separation 

parameter set by said color constituent extracting means 
is stored in a document discriminating dictionary unit 
together with said data information. 

3. An apparatus for discriminating a document as 
25 set forth in Claim 2, further comprising a document 

determination means for comparing for determination said 
data information prepared from image data obtained by 
reading a document to be discriminated based on said 
color separation parameter with said data information 
30 stored in said document discriminating dictionary unit* 

4. A method for discriminating a document prepared 
in an optional format based on image data read from said 
document, said method comprising the steps of: 

cutting out image data corresponding to a 
35 designated specified portion of said document; 

analyzing color constituents of said image 
data so cut out, selecting a specific color constituent 
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and setting a color separation parameter; and 

preparing data information for said 
specified portion from said cut out image data based on 
said color separation parameter; whereby 
5 said document is discriminated by said 

data information* 

5. A method for discriminating a document as set 
forth in Claim 4, wherein said color constituent is 
analyzed with three primary colors of color, one of said 

10 three primary colors is selected as said specific color 
constituent, and said color separation parameter is 
determined based on density distributions of said three 
primary colors • 

6 . A method for discriminating a document as set 
15 forth in Claim 4, wherein said color separation parameter 

is stored in the document discriminating dictionary unit 
together with said data information. 

7. A method for discriminating a document as set 
forth in Claim 6, wherein data information is prepared 

2 0 from image data obtained by reading a document to be 

discriminated based on said color separation parameter, 
and wherein said data information so prepared is compared 
for determination with said data information stored in 
said document discriminating dictionary unit. 
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ABSTRACT 

The invention relates to a document discriminating 
apparatus and a discriminating method for use in 
5 processing documents at financial institutions. A 

characteristic portion inherent in an optional format is 
cut out from image data read from a document in the 
optional format. Color constituents of the cut out image 
data are analyzed, and a color constituent exhibiting 

10 characteristics is selected from the constituents, and a 
color separation parameter is set for the selected color 
constituent. Data information is prepared which is 
related to the image data cut out based upon the color 
separation parameter. On the other hand, data 

15 information is prepared from image data obtained by 

reading a document to be discriminated based on the color 
separation parameter. Then, the data information is 
compared for determination with the data information 
stored in the document discriminating dictionary unit, 

2 0 whereby the document is discriminated. 
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Robert B. Murray. Reg. No. 22,980^"E. Marcie Emas, Reg. No. 32,13J^ Dougias H. 
Goldhush. Reg. No.^3aa25LMomca Chin Kitts. Reg. No. 3 6,1057 g ichard J. 
Bcrman, R<^. No. 3a4fl2rKmg L. Wong, Reg. No. 3 7,500: J Carcn K. Costantlno. 
R^. No. 3S40Z;.Jame$ A. Poulos, m, R^. No. 3XJ14^V9inx^ D. Minr. Reg. 
No. 3T»4CB;. .Sharon N. Klesner, Reg. No. 3fi32&aiid Marat Ozga^ Rec. No. 44.275: 
Bradl^ D. Goldizeii, R^. No. 4 3>«n;aad N. Atexander Note, Reg. No. 45,6S9. 



Please direct all conununications to the following address: 
ARgNT FOX KT^OTMRT? PJ ^ OTKIN & KAH N. PLLC 
K SOCoimecH ait Av^n^F, , Suite 600 
W ashington, D.C. 200 36>5339 
Tel: (202) 857-6000; Fax: (202) 857-6395 





^ Full name of sole or first inventor 

V ^^^Takayuki iVIatsui 




tnvemofs signature ^ Date 

MjWftii InddMl February 8, 2002 




Restaence 


Kawasaki. Janan •/X 




C»tiZ5r.sritD 

Japanese 




Post Office Adcress 

c/o FUJITSU LIMITED. 1-1 , Kamikocaanaka 


4-chome, Nakahara-ku, Kawasaki -shi , 
Kanagawa 211-8588 Japan 




"^^--iDO second jomt mvemor. if any 

Yufeaka Ka1-.5i!^|^1-;. ^ 




Secsnd mvcntofs signature Date 

^^J%£^ k^^tMi^ J'j^ F^ruarv 8, 2002 




Resioe^e 

Kawasaki, Jaoan Tf/^/ 




Japanese 




Post Office Address 

c/o FUJITSU LIMITED, 1-1, Kamikodanaka 


4-chcxne, Nakahara-ku, Kawasaki -shi, 
Kanagawa 211-8588 Japan 



(Supply stmiUr information and signature for thtrd and subsequent 
joint inventors.) 
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-'^Q Fuliiaawaf thU|oiBinn«ar.if any 

Kazunori Yamamoto 






TMiRttemor'ssiviinn Oate 

K^Vj4(4i^tl (Lim^^minP February 8. 2002 






Kawasaki, JaDan \T -PA 






Japanese 






Post Qffica Address 

c/o FUJITSU LIMITED, 1-1, Kamikocaanaka 


4-cJicnie, Nakahara-ku, Kawasaki -shi, 
Kanagav#a 21 1 -8588 Japan 






"6X3 Mnaireof fourth jam mwntor, if any 

Shinichi Ecaichi 






Fourth inventor's sgnatiffs Oate 
^£^M^c^ B^Q^.\ February 8, 2002 






Residonca 

Kawasaki, Japan , { 






Ctiieoshiii 

Japanese 






Post Office Address 

c/o FUJITSU LIMITED, 1-1, Kamikodanaka 






4-chane, Nakahara~ku, Kawasaki -shi, 
Kanagawa 211-8588 Japan 






FoS name of fifth joint inventor, if any 






Ftfih imraitor's signature Oan 


ffi^ Resicience 

V s 5 


CtiiMBliip 






Post Of free Address 








FoB name of sixth joint inveRUir, if any 




e# 


Sixth inventor's signature Oate 






Residence 






Cinzenship 






Post Qffjce AddTEXs 
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